This project came out of a simple frustration. Every chat application we use daily — WhatsApp, Telegram, Instagram DMs — does one thing well: deliver messages fast. But the moment you need to fix your grammar before sending, translate what someone said, or adjust your tone for a formal conversation, you leave the app. You open Google Translate in one tab, Grammarly in another, maybe even ChatGPT to rephrase something. The flow breaks every single time.
We wanted to see if we could build a messaging platform where all of that intelligence lives inside the chat itself. Not as a separate bot you have to talk to, but as tools wired directly into the input box and message bubbles. You type a message, tap a button, and it gets rewritten in a professional tone. You receive a message in French, and you translate it without leaving the conversation. You want to send a custom image — just describe it in a text prompt.
We built the system using the MERN stack with Socket.IO handling real-time message delivery. For the AI features, we chose Google Gemini and Stability AI — both accessible through REST APIs, both with free tiers that fit a student project budget of exactly zero rupees. Media uploads go through Cloudinary. Location sharing uses the browser’s built-in Geolocation API.
The system works. Messages arrive instantly. Grammar cor- rection, tone modification, translation, and summarization all function as expected through the Gemini API. Image generation through Stability takes around ten seconds but produces usable results. We also implemented typing indicators, read receipts with blue ticks, and emoji reactions — features that users expect from any modern chat app but that are surprisingly tricky to get right with Socket.IO room management. This paper covers the architecture, what we built, what broke along the way, and what we learned from it.
Introduction
The document describes a student-built real-time chat application that integrates AI features directly into messaging, solving the problem of users constantly switching between multiple apps (chat apps, translators, AI tools) for simple tasks like rewriting messages or translating text. The goal is to bring all these functions into a single seamless chat experience.
The system is built using the MERN stack (MongoDB, Express, React, Node.js) with Socket.IO for real-time communication and supports features such as typing indicators, read receipts, emoji reactions, media sharing, and location sharing. It uses WebSockets for instant messaging and Cloudinary for efficient media storage. Authentication is handled using JWT and bcrypt for security.
A key innovation is the integration of AI APIs (Google Gemini and Stability AI) directly into the chat system. These APIs provide real-time grammar correction, tone adjustment, translation, summarization, sentiment analysis, and image generation. All AI processing is handled server-side with caching and rate limiting to improve performance and reduce API costs.
The system also includes a structured architecture with REST APIs for standard operations and Socket.IO for real-time events like messaging and status updates. Advanced chat features such as read receipts and typing indicators are implemented using efficient event broadcasting techniques.
The project emphasizes practical engineering challenges, including API key security, JWT handling issues, and real-time synchronization. It highlights how integrating AI into communication tools requires not just model access, but careful system design.
Conclusion
We set out to build a chat application where AI assistance is not a separate experience but something woven into the messaging flow. The result is a working platform that handles real-time communication through Socket.IO, integrates Google Gemini for text intelligence and Stability AI for image gen- eration, and implements the chat mechanics that users expect from any modern messaging app. The hardest parts were not the AI integration — calling an API is straightforward once you have the plumbing right. The hardest parts were the real-time features. Getting typing indicators to feel responsive without flooding the socket server. Making read receipts update correctly across multiple con- current sessions. Handling the edge cases in emoji reaction toggling. These are small problems individually, but they consumed more development time than the entire AI pipeline. We learned more building this than we expected. Not just about the technologies, but about how many small decisions go into making software feel polished rather than just functional.
References
[1] S. Hrastinski, “Asynchronous and synchronous e-learning,” Educ. Quar- terly, vol. 31, no. 1, pp. 51–55, 2008.
[2] J. Wei et al., “Emergent abilities of large language models,” Trans. Mach. Learn. Res., 2022.
[3] T. Brown et al., “Language models are few-shot learners,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 33, 2020, pp. 1877–1901.
[4] V. Pimentel and B. G. Nickerson, “Communicating and displaying real- time data with WebSocket,” IEEE Internet Comput., vol. 16, no. 4, pp. 45–53, Jul./Aug. 2012.
[5] MongoDB Inc., “Mongoose Documentation,” 2025. [Online]. Available: https://mongoosejs.com/docs/guide.html
[6] Google, “Gemini API Documentation,” 2025. [Online]. Available: https://ai.google.dev/gemini-api/docs
[7] Stability AI, “Platform API Reference,” 2025. [Online]. Available: https://platform.stability.ai/docs/api-reference
[8] Cloudinary, “Cloudinary Documentation,” 2025. [Online]. Available: https://cloudinary.com/documentation
[9] D. M. West, “Education technology can help address inequality in a high inequality world,” Brookings Institution, TechTank blog, 2022.
[10] M. Jones, J. Bradley, and N. Sakimura, “JSON Web Token (JWT),” RFC 7519, IETF, May 2015. [Online]. Available: https://datatracker.ietf.org/ doc/html/rfc7519
[11] N. Provos and D. Mazie`res, “A future-adaptable password scheme,” in Proc. USENIX Annu. Tech. Conf. (ATEC), 1999, pp. 81–91.
[12] R. T. Fielding and R. N. Taylor, “Principled design of the modern web architecture,” ACM Trans. Internet Technol., vol. 2, no. 2, pp. 115–150, 2002.
[13] J. Nielsen, “Response times: The 3 important limits,” Nielsen Norman Group, 1993. [Online]. Available: https://www.nngroup.com/articles/ response-times-3-important-limits/
[14] Socket.IO, “Socket.IO Documentation,” 2025. [Online]. Available: https://socket.io/docs/v4/
[15] Meta Open Source, “React Documentation,” 2025. [Online]. Available: https://react.dev